Iterative Scaled Trust-Region Learning in Krylov Subspaces via Pearlmutter's Implicit Sparse Hessian-Vector Multiply

نویسندگان

  • Eiji Mizutani
  • James Demmel
چکیده

The online incremental gradient (or backpropagation) algorithm is widely considered to be the fastest method for solving large-scale neural-network (NN) learning problems. In contrast, we show that an appropriately implemented iterative batch-mode (or block-mode) learning method can be much faster. For example, it is three times faster in the UCI letter classification problem (26 outputs, 16,000 data items, 6,066 parameters with a two-hidden-layer multilayer perceptron) and 353 times faster in a nonlinear regression problem arising in color recipe prediction (10 outputs, 1,000 data items, 2,210 parameters with a neuro-fuzzy modular network). The three principal innovative ingredients in our algorithm are the following: First, we use scaled trust-region regularization with inner-outer iteration to solve the associated “overdetermined” nonlinear least squares problem, where the inner iteration performs a truncated (or inexact) Newton method. Second, we employ Pearlmutter’s implicit sparse Hessian matrix-vector multiply algorithm to construct the Krylov subspaces used to solve for the truncated Newton update. Third, we exploit sparsity (for preconditioning) in the matrices resulting from the NNs having many outputs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On Iterative Krylov-Dogleg Trust-Region Steps for Solving Neural Networks Nonlinear Least Squares Problems

This paper describes a method of dogleg trust-region steps, or restricted Levenberg-Marquardt steps, based on a projection process onto the Krylov subspaces for neural networks nonlinear least squares problems. In particular, the linear conjugate gradient (CG) method works as the inner iterative algorithm for solving the linearized Gauss-Newton normal equation, whereas the outer nonlinear algor...

متن کامل

Image Classification via Sparse Representation and Subspace Alignment

Image representation is a crucial problem in image processing where there exist many low-level representations of image, i.e., SIFT, HOG and so on. But there is a missing link across low-level and high-level semantic representations. In fact, traditional machine learning approaches, e.g., non-negative matrix factorization, sparse representation and principle component analysis are employed to d...

متن کامل

Arnoldi-based Sampling for High-dimensional Optimization using Imperfect Data

We present a sampling strategy suitable for optimization problems characterized by high-dimensional design spaces and noisy outputs. Such outputs can arise, for example, in time-averaged objectives that depend on chaotic states. The proposed sampling method is based on a generalization of Arnoldi’s method used in Krylov iterative methods. We show that Arnoldi-based sampling can effectively esti...

متن کامل

Block Krylov Space Methods for Linear Systems with Multiple Right-hand Sides: an Introduction

In a number of applications in scientific computing and engineering one has to solve huge sparse linear systems of equations with several right-hand sides that are given at once. Block Krylov space solvers are iterative methods that are especially designed for such problems and have fundamental advantages over the corresponding methods for systems with a single right-hand side: much larger sear...

متن کامل

Recursion Relations for the Extended Krylov Subspace Method

Abstract. The evaluation of matrix functions of the form f(A)v, where A is a large sparse or structured symmetric matrix, f is a nonlinear function, and v is a vector, is frequently subdivided into two steps: first an orthonormal basis of an extended Krylov subspace of fairly small dimension is determined, and then a projection onto this subspace is evaluated by a method designed for small prob...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003